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("■^ ' We show several ways to round a real matrix to an integer one such that 

r\l , the rounding errors in all rows and columns as well as the whole matrix 

are less than one. This is a classical problem with applications in many 

fields, in particular, statistics. 

We improve earlier solutions of difi'erent authors in two ways. For 

rounding matrices of size jn x n, we reduce the runtime from OiXran)^^ 
jyp^ ' to 0(77171 log(?7i?i)). Second, our roundings also have a rounding error of 

Q_) , less than one in all initial intervals of rows and columns. Consequently, 

arbitrary intervals have an error of at most two. This is particularly useful 

in the statistics application of controlled rounding. 
""►^ . The same result can be obtained via (dependent) randomized round- 

QQ ' ing. This has the additional advantage that the rounding is unbiased, that 

\^ , is, for all entries y^j of our rounding, we have E{yij) — Xij, where Xij is 

f^ ■ the corresponding entry of the input matrix. 



o. 

^ ■ 1 Introduction 

O 

C/5 . In this paper, we analyze a rounding problem with strong connections to statis- 

J^ ' tics, but also to different areas in discrete mathematics, computer science, and 

operations research. We show how to round a matrix to an integer one such 
that rounding errors in intervals of rows and columns are small. 

Let m, n be positive integers. For some set S, we write 5"™^" to denote the 
C^ ' set of 7TT, X 71 matrices with entries in 5". For real numbers a,b let [a..b] := {z G 

Z \ a < z < b}. We show the following. 
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Theorem 1. For all X G [0, 1)""^" 
V6g [l..n], i E [l..m] 

Vbe [l..m], j e [l..n\ 



a rounding Y G {0, 


1}™X" swc/i i/iaf 




b 


<1, 




b 

i— 1 


<1, 






y^j) 


< 1 



can be computed in time 0{mn\og{mn)) . 

This result extends the famous rounding lemma of Baranyai |2] and several 
results on controlled rounding in statistics by Bacharach |2| and Causey, Cox 
and Ernst [7|. 

1.1 Baranyai's Rounding Lemma and Applications in Statis- 
tics 

Baranyai [3| used a weaker version of Theorcm^to obtain his well-known results 
on coloring and partitioning complete uniform hypcrgraphs. He showed that 
any matrix can be rounded such that the errors in all rows, all columns and the 
whole matrix are less than one. He used a formulation as flow problem to prove 
this statement. This yields an inferior runtime than the bound in Theorem ^ 
However, algorithmic issues were not his focus. 

In statistics, Baranyai's result was independently obtained by Bacharach |2] 
(in a slightly weaker form) and again independently by Causey, Cox and Ernst [7] . 
There are two statistical applications for such rounding results. Note first that 
instead of rounding to integers, our result also applies to rounding to multiples 
of any other base (e.g.. multiples of 10). Such a rounding can be used to improve 
the readability of data tables. 

The main reason, however, to apply such a rounding procedure is confiden- 
tiality protection. Frequency counts that directly or indirectly disclose small 
counts may permit the identification of individual respondents. There are vari- 
ous methods to prevent this |25| . one of which is controlled rounding P|. Here, 
one tries to round an (m + 1) x {n + l)-table X given by 






'^4=1 ^v)j=l...r 



[T,j=iXijj^ 



Era ^—y n 
j=l 2^1 = 1 • 



to an {m + 1) X {n + l)-table Y such that additivity is preserved, i.e., the last 
row and column of Y contain the associated totals of Y. In our setting we round 



the 771 X n-matrix X defined by the mn inner cells of the table X to obtain a 
controlled rounding. 

The additivity in the rounded table allows to derive information on the row 
and column totals of the original table. In contrast to other rounding algorithms, 
our result also permits to retrieve further reliable information from the rounded 
matrix, namely on the sums of consecutive elements in rows or columns. Such 
queries may occur if there is a linear ordering on statistical attributes. Here an 
example. Let Xij be the number of people in country i that are j years old. Say 
Y is such that j^Y is a rounding of j^X as in TheoremQ] Now X]7=20 Vij ^^ 
the number of people in country i that are between 20 and 40 years old, apart 
from an error of less than 2000. Note that such guarantees are not provided by 
the results of Baranyai [2j , Bacharach [2] , and Causey, Cox and Ernst [7] . 

1.2 Unbiased Rounding 

In Section 01 we present a randomized algorithm computing roundings as in 
Theorem ^ It has the additional property that each matrix entry is rounded 
up with probability equal to its fractional value. This is known as randomized 
rounding |20| in computer science and as unbiased controlled rounding |S1 115| 
in statistics. Here, a controlled rounding is computed such that the expected 
values of each table entry (including the totals) equals its fractional value in the 
original table. 

To state our result more precisely, we introduce the following notation. For 
a; £ M write [zj := max{z e Z | z < r}, [a;] :~ min{z G Z | z > r} and 
{x} := X — \x\ . 

Definition 2. Let x E W. A random variable y is called randomized rounding 
of X, denoted y « x, if Pr{y = [xj + 1) = {x} and Pr{y = [a;J) = 1 — {x}. 
For a matrix X G R™^", we call an m x n matrix-valued random variable Y 
randomized rounding of X if ytj ~ Xij for all i G [L.ttj], j G [l..n]. 

We then get the following randomized version of Theorem ^ 

Theorem 3. Let X G [0, 1)™'*" be a matrix having entries of binary length at 
most £. Then a randomized rounding Y fulfilling the additional constraints that 

b b 

yt G [l..n], i G [l..jn] : ^ Xy w ^y^, 

j=i i=i 

b b 

\/b G [l..m], j G [l..n] : ^ Xy ~ ^y^j, 

j=l i=l 

m n m n 

can be computed in time 0(rnn£). 



For a matrix with arbitrary entries Xij := X]d=i ^L ^ '^ + x'^^ where x^ < 2 

and cCj-- S {0, 1} for i G [1..to],j G [l..n],(i G [l--^], wc may use the i highest 
bits to get an approximate randomized rounding. If (before doing so) wc round 
the remaining part .t^ of each entry to 2^^ with probabihty 2^a;^ ■ and to 
otherwise, we still have that Y k^ X, but we introduce an additional error of at 
most 2~^mn in the constraints of Theorem|3| 

1.3 Other Applications 

One of the most basic rounding results states that any sequence xi , . . . , a;„ of 
numbers can be rounded to an integer one yi , . . . , j/n such that the rounding 
errors {'^-^^{xj — yj)\ are less than one for all a,b G [l..n]. Such roundings 
can be computed efficiently in linear time by a one-pass algorithm resembling 
Kadane's scanning algorithm (described in Bentley's Programming Pearls |S]). 
Extensions in different directions have been obtained in |lll 1121 [T7I 1211 1^ . This 
rounding problem has found a number of applications, among others in image 
processing [Tll^. 

Theorem^extends this result to two-dimensional sequences. Here the round- 
ing error in arbitrary intervals of a row or column is less than two. In |14) a 
lower bound of 1.5 is shown for this problem. Thus an error of less than one as 
in the one-dimensional case cannot be achieved. 

Rounding a matrix while considering the errors in column sums and partial 
row sums also arises in scheduling [SI El El El- For this, however, one does 
not need our result in full generality. It suffices to use the linear-time one-pass 
algorithm given in |14| . This algorithm rounds a matrix having unit column 
sums and can be extend to compute a quasi rounding for arbitrary matrices. 
While this algorithm keeps the error in all initial row intervals small, for columns 
only the error over the whole column is considered. 

1.4 Knuth's Two-way Rounding 

In |17| . Knuth showed how to round a sequence of n real numbers Xi to yi G 
{[xiJ, \xi~\} such that for two given permutations cri,a2 G S'„, we have both 

I Y.Lii^criit) - y<Ti(j))| < "-/("- + 1) and I EJ=i(2^^2W - 2/^2«)l < ^/i^ + 1) for 
all k. Knuth's proof uses integer flows in a certain network ^^l- On account of 
this his worst-case runtime is quadratic. 

One application Knuth mentioned in |17| is that of matrix rounding. For 
this, simply choose a permutation ai that enumerates the Xij row by row, and a 
permutation (72 that enumerates the Xij column by column. Applying Knuth's 
algorithm to these permutations gives a rounding with errors smaller than one 
in all initial row and column intervals. 



2 Preliminaries 

In this section, wc provide two easy extensions of the result stated in the in- 
troduction. First, we immediately obtain rounding errors of less than two in 
arbitrary intervals in rows and columns. This is supplied by the following lemma. 

Lemma 4. Let Y be a rounding of a matrix X such that the errors \ '^j^iixij — 
yij)\ in all initial intervals of rows are at most d. Then the errors in arbitrary 
intervals of rows are at most 2d, that is, for all i G [1..to] and all 1 < a < b < n, 



E(^ 



Vv) 



< 2d. 



This also holds for column intervals, i.e., if the errors \ X]i=i(^ij ~ 2/ij)l *'^ '^^^ 
initial intervals of columns are at most d' , then the errors \ '^i^ai-'^ij — yij)\ in 
arbitrary intervals of columns are at most 2d' . 

Proof. Let i G [l..rn] and 1 < a <b <n. Then 



X! (-^'j- - y^o ) = Y^{x^3- vv ) - $Z (^ 



Vv) 



< 



b 

E(^ 



Vv) 



o-l 



+ 






< 2d. 



D 



From now on, we will only consider matrices having integral row and column 
sums. This is justified by the following lemma. 



Lemma 5. Assume that for any X G 
a rounding Y G Z™^" such that 

\/b G [l..n], i G [l..m] 
V&G [l..m], j G [1..71] 



with integral column and row sums 



E(^ 

b 

E(^ 



Vv) 



Vv) 



<1, 



< 1 



(1) 
(2) 



can be computed in time T(m,n). Then for all X G M'"^" with arbitrary column 
and row sums a rounding Y G Z™^" satisfying ^, (0) and 



EE^^y ~y'3) 

i=l 3 = 1 



< 1 



(3) 



can be computed in time T{m + 1, n + 1) + 0{mn). 



Proof. Given an arbitrary matrix X £ R™^", we add an extra row taking what 
is missing towards integral column sums and add an extra column taking what 
is missing towards integral row sums. Hence, let X G ]r("i-+i)x("+i) be such 
that 











Xij — Xij 


for all i e [1 


..m], j G [l..n] 


Xm-\-l,j — 
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for all j e [l..n], 


^771+1, n+1 = 


+ 1 — 
r- m 
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n 

] = l 
m 

1=1 


for all i e [1 

r ri n 
3 = 1 


..to], 

n 



Clearly, X has integral row and column sums. Therefore it can be rounded to 
Y G z(™+i)^("+i) satisfying ^ and @ in time T(m + 1, n + 1). 

For ||2J), observe that if a row (resp. column) sum is integral, the rounding 
error in the row (resp. column) is 0. Then the rounding error in the whole 
matrix is also 0, if all row and column sums are integral. Using this and the 
triangle inequality, we get inequality (|3Jl as follows. 






■yy) 



m-\-l n+1 



m+1 



n+1 
— 2_^i^m+l,j — ym+l,j) + {Xm+l,n+l — 2/m+l,n+l) 

< + + + |a::m+i,„+i - 2;m+l,n+l| < 1- 

By setting ijij ~ yij for all i G [l..?7i] and j G [1..?^], we obtain the desired 
rounding F G Z™^". D 



3 Bitwise Rounding 

In this section, we present an alternative approach which will lead to a supe- 
rior runtime. It uses a classical result on rounding problems, namely, that the 
problem of rounding arbitrary numbers can be reduced to the one of rounding 
half-integral numbers. For X G {0, i}™^", our rounding problem turns out to 
be much simpler. In fact, it can be solved in linear time. 



3.1 The Binary Rounding Method 

The following rounding method was introduced by Beck and Spencer ^ in 1984. 
They used it to prove the existence of two-colorings of N having small discrcp- 



ancy in all arithmetic progressions of arbitrary length and bounded difference. 

Given arbitrary numbers that have to be rounded, they use their binary 
expansion and (assuming all of them to be finite) round 'digit by digit'. To do 
the latter, they only need to understand the corresponding rounding problem 
for half-integral numbers. That is, an i-hit number x = x' + ^x",x' G {0,^} 
can be recursively rounded by rounding the {£ — l)-bit number x" to y" G {0, 1} 
and then rounding x' + ^y" S {0, ^, 1} to y £ {0, 1}. The resulting rounding 
errors are at most twice the ones incurred by the half- integral roundings. 

If some numbers do not have a finite binary expansion, one can use a suffi- 
ciently large finite length approximation. To get rid of additional errors caused 
by this, we invoke a slight refinement of the binary rounding method. In |ll)| 
it was proven that the extra factor of two can be reduced to an extra factor of 
2(1 — 2;:), where r is the number of rounding errors we want to keep small. 

In our setting, the number of rounding errors is the number of all initial row 
and column intervals, i.e., r = 2mn. In summary, we have the following. 



Lemma 6. Assume that for any X G {0, 5}' 

be computed in time T that satisfies 



a rounding Y G {0, 1}'"^" can 



\fb e [l..n], i e [l..m] 



yb e [l..m], j e [l..n] 



b 
2_^\^ij ~ Vij) 



<D, 



< D. 



Then for alUeN and X e [0, 1)™^" a rounding Y e {0, 1}™X" such that 



V6 G [l..?i], i e [l..m] 

V6g [l.-in], j G [l..n] 
can be computed in time 0{£T). 






-Vij) 


<2(1- 


^^n)D + 2~% 
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7=1 


-Vio) 


<2(1- 


-ltl)D + 2-'b 



3.2 Rounding Half-Integral Matrices 

It remains to show how to solve the rounding problem for half-integral matrices. 
Based on Lemma HI we can assume integrality of row and column sums. 

Here is an outline of our approach. For each row and column, we consider 
the sequence of its ^-entries and partition them into disjoint pairs of neighbors. 
From the two ^s forming such a pair, exactly one is rounded to 1 and the other 
to 0. Thus, if such a pair is contained in an initial interval, it does not contribute 
to the rounding error. 

To make the idea precise, assume some row contains exactly 2K entries of 
We call the {2k — l)-th and (2fc)-th i-entry of this row a row pair, for 



value ^ 



all 1 < fc < ii'. The ^s of a row pair are mutually referred to as row neighbors. 
Similarly, we define column pairs and column neighbors. Figure m^a)| shows 
a half-integral matrix together with row and column pairs marked by boxes. 
Since each ^ belongs to a row pair and a column pair, the task of rounding is 
non-trivial. 

Our solution makes use of an auxiliary graph Qx which contains the neces- 
sary information about row and column neighbors. Each ^-entry is represented 
by a vertex that is labeled with the corresponding matrix indices. Each pair is 
represented by an edge connecting the vertices that correspond to the paire d -^s. 
Figure Wb)] shows the auxiliary graph that belongs to the matrix of Figure Ja)| 
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Figure 1: Example for the construction of an auxiliary graph, (a) Input matrix 
X with its row and column pairs, (b) Auxiliary graph Qx- Vertices are labeled 
with matrix indices and edges connect vertices of row and column pairs. Qx is 
a disjoint union of even cycles. 



We collect some properties of this auxiliary graph. 
Lemma 7. Let X G {0, 1}™^" be a matrix with integral row and column sums. 

(a) Every vertex of Qx has degree 2. 

(b) Qx is a disjoint union of even cycles. 

(c) Qx is bipartite. 

Proof, (a) Because of the integrality of the row and column sums, the number 
of 2-entries in each row and column is even. Hence each ^-entry has a row and 
a column neighbor. In consequence, each vertex is incident with exactly two 
edges, (b) The edge sequence of a path in Qx corresponds to an alternating 
sequence of row and column pairs. Therefore any cycle in Qx consists of an 
even number of edges. Since each vertex has degree two, Qx is a disjoint union 
of cycles, (c) Clearly, every even cycle is bipartite. D 



With this result, we are able to find the desired roundings. 



Lemma 8. Let X £ {0,^}™^" and let VqUVi be a bipartition ofQx- Define 



F = (2/.,)e{0,ir 



by 



0, if Xij = 
Vij = ■( 0, ifxij = \ and {i,j) G Vq 

1, ifxij = \ and {i,j) G Vi. 



Then Y has the property that 

V5 e [l..n], i e [l..m] 

V6 G [l..m], j G [l..n] 



2_^\^ij Vij) 
6 



< 



< 



(4) 
(5) 



Proof. Because Os of X are maintained in Y , it suffices to consider ^-entries to 
determine the rounding error in initial intervals. Since the rounded values for 
the (2fc — l)-th and (2/c)-th i~entry sum up to 1 by construction, there is no 
error in initial intervals that contain an even number of tt 
they contain an odd number of gS. 



s, and an error of ^ if 
D 



After these considerations, we are able to present an algorithm that solves the 
problem in two steps: first we compute the auxiliary graph and afterwards the 
output matrix. To construct Qxi we transform the input matrix X column by 
column from left to right. Of course, generating the labeled vertices is trivial. 
The column neighbors are detected just by numbering the ^-entries within a 
column from top to bottom. When there are 2fc such entries, we insert an edge 
between the vertices with number 2i — 1 and 2z with 1 < i < k. The strategy 
to detect row neighbors is the same but we need more information. Therefore 
we store for each row the parity of its ^-entries so far and, if the parity is odd, 
further a pointer to the last occurrence of i in this row. Then, if the current 
i is an even occurrence, we have a pointer to the preceding i and are able to 
insert an edge between the corresponding vertices in Qx- 

The output matrix Y can be computed from X as follows. Every in X is 
kept and every ^-sequence that corresponds to a cycle in Qx is substituted by 
an alternating 0~l~sequence. By Lemma [7| this is always possible. It does not 
matter which of the two alternating 0-1 sequences we choose. 

The graph Qx can be realized with adjacency lists (the vertex degree is 
always 2). The additional information per row can be realized by a simple 
pointer-array of length m (a special nil- value indicates even parity) . 

Since the runtime of each step is bounded by the size of the input matrix, the 
entire algorithm takes time 0{mn). In addition to the constant amount of space 
we need for each of the m rows, we store all k entries of value ^ in the auxiliary 
graph. This leads to a total space consumption of 0{m + k). Summarizing the 
above, we obtain the following lemma. 



Lemma 9. Let X e {0, i}""^". Then a rounding Y e {0, 1}™X" satisfying the 
inequalities O) and J^ can be computed in time 0{mn). 

3.3 Final Result 

By combining Lemma El and El wc obtain the following result. 

Theorem 10. For all £ e N and X e [0, 1)"X" a rounding Y e {0, l}"'x« such 
that 



Vb e [l..n], i e [l..m] 



V6 e [l..m], j e [l..n] 



b 
2_^\^ij ~ Vij) 



< 1 1 L 2~^/) 



< 1 



4m n 



2-^6 



can be computed in time 0{£mn). 

For i > log2(477in max{77i, n}) the above theorem together with LemmajSlyields 
Theorem n in the introduction. 



4 Unbiased Rounding 

In this section we give a randomized algorithm that computes a randomized 
rounding satisfying Theorem O First observe, that the {0, i} case has a very 
simple randomized solution. Whenever it has to round a cycle, it chooses one of 
the two alternating 0-1-sequences for each cycle uniformly at random. Then, 
each Xij = ^ is rounded up with probability i 

Now consider the output of the bitwise rounding algorithm using the ran- 
domized rounding algorithm for the half-integral case as subroutine. We adapt 
the proofs of J2| to show that this algorithm computes an unbiased controlled 
rounding. 

Theorem 11. Let X G [0, 1)™^" be a matrix containing entries with binary 
representation of length at most £. Let Y be a random variable modeling the 
output of the randomized algorithm. Then Y ^ X and 

b b 

\/b e [l..n], i G [l..m] : ^Vij ^^Xij, (6) 

i=i i=i 

b b 

V& e [l..m], j e [l..n] : ^ y,, « ^ x,,. (7) 



i=l i=l 



Proof. We prove Y Ki X hy induction. For ^ = 1 it is clear that Pr(yy = 1) = 
Xij. If ^ > 1, write .t^ = x'^j + ^x'-j, where x'^j e {0, ^} and x'/j e [0, 1) has bit- 
length i~l. Let yf be the rounding computed for a;"-. Then Pr(2/" = 1) = a;"- 



10 



by induction. Now the algorithm will round Xij := x'^j + ^y'/j G {0, ^, 1} to j/y. 
If y"- = 1, then xij will be rounded up with probability 1 if x'^^ = ^ and with 
probability i otherwise. If, on the other hand, j/"- = 0, then Xij will be rounded 
up with probability x'; . Thus 

To prove equation ©, observe that Sy :— Ylj=iyij i^ ^ rounding of Sx '■= 
S,=i ^ij by LemmaEl We also have E{sy) — J2j=i ^iVij) = ^x by linearity of 
expectation. But also E{sy) ~ Pr(sy = Lsa:J)L'5a:J +Pi'(sy = l^xl +^){[sx\ +1); 
which is only possible if Sy ~ s^.. The proof of ((TJ is analogous. D 
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